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ABSTRACT 

Peptide-protein interactions are among the most pre- 
valent and important interactions in the cell, but a 
large fraction of those interactions lack detailed str- 
uctural characterization. The Rosetta FlexPepDock 
web server (http://flexpepdock.furmanlab.cs.huji 
.ac.il/) provides an interface to a high-resolution 
peptide docking (refinement) protocol for the model- 
ing of peptide-protein complexes, implemented 
within the Rosetta framework. Given a protein recep- 
tor structure and an approximate, possibly inaccur- 
ate model of the peptide within the receptor binding 
site, the FlexPepDock server refines the peptide to 
high resolution, allowing full flexibility to the peptide 
backbone and to all side chains. This protocol was 
extensively tested and benchmarked on a wide array 
of non-redundant peptide-protein complexes, and 
was proven effective when applied to peptide start- 
ing conformations within 5.5 A backbone root mean 
square deviation from the native conformation. 
FlexPepDock has been applied to several systems 
that are mediated and regulated by peptide-protein 
interactions. This easy to use and general web ser- 
ver interface allows non-expert users to accurately 
model their specific peptide-protein interaction of 
interest. 

INTRODUCTION 

Protein-protein interactions facilitate most cellular pro- 
cesses. It has lately become apparent that a significant 
fraction of these interactions are mediated by peptide- 
protein interactions, which involve the binding of a 
linear, unfolded peptide stretch onto a globular protein 
receptor (1-3). Peptide-mediated interactions indeed play 
key roles in major cellular processes, predominantly in 



signaling and regulatory networks that require short-lived 
signals (4), and also in cell localization, protein degrad- 
ation and immune response (3,4). However, despite their 
importance and estimated abundance, peptide-protein 
complexes are underrepresented among solved structures 
(5,6). Therefore, protocols that can provide accurate struc- 
tural models of peptide-protein interactions represent an 
essential tool for the molecular understanding of the 
cellular network of interactions (7). These models can 
then be used as ideal starting points for targeted compu- 
tational and experimental modulation of interactions 
(8,9). For many real-life peptide docking problems, 
coarse-grain models can be often obtained from 
complexes with alternative peptides, unbound structures 
or homology models where existing structures provide ap- 
proximate structural information about the receptor and 
the peptide or the location of the binding site [e.g. peptides 
that bind to MHC, SH3, WW or PDZ domains (10-13)]. 

Rosetta FlexPepDock (14) is a high- resolution protocol 
for the refinement of peptide-protein complex structures 
that is implemented in the Rosetta modeling suite frame- 
work (15). Starting from a coarse model of the interaction, 
FlexPepDock performs a Monte Carlo-Minimization- 
based approach to refine all the peptide's degrees of 
freedom (rigid body orientation, backbone and side 
chain flexibility) as well as the protein receptor side 
chains conformations. The Rosetta FlexPepDock web 
server described here provides a simple interface for the 
usage of this protocol, and by this aims to increase the 
accessibility of structural models of peptide-protein inter- 
actions to a broad range of scientists. 

While a plethora of web servers is available for the 
docking of a pair of globular proteins [e.g. RosettaDock 
(16), HADDOCK (17), PatchDock (18), ClusPro (19) and 
more; see CAPRI (20)], these are not intended for the 
docking of peptides. In particular, they do not consider 
the flexibility of the protein backbone during the docking 
process, and are thus not suitable for the docking of 
flexible peptides. Web servers are also available for 
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small-molecule docking [e.g. Autodock (21), DOCK (22), 
PatchDock (18), ParDock (23), MEDdock (24) and 
others]. These servers, however, are suitable for molecules 
with a limited number of rotatable bonds only, and there- 
fore not applicable to peptides, which typically contain 
many more internal degrees of freedom than small mol- 
ecules (14,25). Other servers might identify the rough 
orientation of the peptide (and can serve as a complemen- 
tary, preliminary step to FlexPepDock), but do not actu- 
ally model the peptide-protein complex. These include 
CASTp (26), which aims at detecting pockets on protein 
surfaces [we previously showed that this feature correlates 
with peptide binding sites (5)], and PepSite (27), which 
predicts peptide binding sites and provides a coarse pre- 
diction of specific peptide residue locations. Finally, other 
software that models peptide-protein complexes such as 
DynaDock (28), or system-specific software for modeling, 
e.g. PDZ-peptide interactions (29) or MHC-peptide inter- 
actions (30), are to our knowledge not accessible to the 
public in the form of a web server. Consequently, the 
Rosetta FlexPepDock web server presented here is cur- 
rently the only server that allows for high-resolution 
modeling of peptide-protein interactions. 

The performance of Rosetta FlexPepDock has been ex- 
tensively tested against a large set of perturbed peptide- 
protein complexes and an effective range of sampling was 
defined (14). Table 1 summarizes the performance of 
FlexPepDock over a bound docking benchmark that 
covers a wide range of increasingly divergent starting 
peptide conformations. More analyses of its performance 
can be found in Raveh et al. (14). For peptides with initial 
backbone (bb) root mean square deviation (RMSD) of up 
to 5.5 A, FlexPepDock is able to create near-native models 
(peptide bb-RMSD <2A) in 91% of the cases for the 
bound receptor, and rank them as one of the top five 
models in 78%. Moreover, the side chains of key residues 



Table 1. FlexPepDock performance as a function of the starting 
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"We measure the performance by two success criteria — a model is con- 
sidered successful if the peptide interface bb-RMSD to native is < 1 A 
(sub-angstrom) or <2A (near-native). 

b Starting structures were binned according to the starting peptide con- 
formation bb-RMSD. In this case, the bound receptor was used for 
docking. 

"Performance when considering just the Top 1 ranking model by 
energy, Top 10 ranking models, or the entire sample of 200 models. 



in binding motifs are modeled particularly well, typically 
within 1A of their native conformations (14). In the 
challenging task of unbound docking, near-native 
models were sampled in 85% of the cases and ranked cor- 
rectly in 59% (for starting structures within 5.5 A 
bb-RMSD from the native conformation). 

In cases where no information is available about the 
conformation of the peptide backbone, docking can be 
started from an extended peptide conformation. In a 
benchmark in which the peptide was docked starting 
from an ideal extended backbone conformation (±135° 
for all cp/\|/ angles) based on a single anchor residue, 
near-native solutions could be sampled in 66% of the 71 
non-helical complexes (31% <1A from native), and 
ranked among the top five solutions in 49% of the cases 
(24% for <1 A from native). 

Rosetta FlexPepDock was tested on peptides of length 
5-15 amino acids, and performance shows little to no 
dependency on the peptide length (see Supplementary 
Table SI). However, we have also repeatedly applied it 
successfully to longer peptides. 

DESCRIPTION OF WEB SERVER 

The main input for the Rosetta FlexPepDock web server is 
a PDB (31) file of the estimated complex between the 
receptor (first chain) and the peptide (second chain). The 
server will dock the peptide starting from this initial con- 
formation. If the native conformation of the peptide lies 
within the effective range of the protocol (see above), it 
will most probably produce high-resolution models for 
this interaction. 

Using the default options, the server will perforin 100 
simulations in full-atom mode and 100 simulations that 
include a preceding low-resolution centroid-based 
optimization protocol [see Raveh et al. (14) for more 
details about the protocol]. It will then rank the total of 
200 created models by their Rosetta energy score and 
provide the user with the top 10 predicted models for 
this interaction, as well as their score and bb-RMSD 
from the starting conformation. In addition, a plot 
showing score versus RMSD for each of the created 200 
models provides information about overall sampling 
(Figure IB). 

ADVANCED OPTIONS 

For more advanced runs, users are able to specify: 

• A reference PDB: the user can upload a reference PDB 
of the peptide-protein interaction. If so, RMSD values 
of the models will be calculated to the reference peptide 
conformation found in this file, rather than to the 
starting conformation. This is useful if for example a 
structure of a similar interaction is available. 

• A constraints file: the user can upload a file that spe- 
cifies distance constraints between different atoms in 
the system. This allows the users to incorporate previ- 
ous experimental knowledge and their intuitions into 
the simulations. For instance, the distance between a 
catalytic residue in the receptor and a modified residue 
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Figure 1. Results provided for an example peptide docking run. 
(A) Graphical representation of the top 10 models (superimposed), as 
well as more detailed figures of the top 5 models (second row). (B) Plot 
of RMSD (x-axis) vs score (y-axis) of all models created by the 
simulation run. Bottom panel: The top 10 models (PDB format 
coordinates), as well as a score file can be downloaded via the 
provided links. This example is based on a 4.9A bb-RMSD starting 
conformation and is taken from line 6 in Table 1. 



in the peptide, or the distances derived from cross- 
linking experiments can easily be reinforced with this 
setup. Another case in which constraints may be useful 
is if the user wants to fix certain interactions that are 
present in the starting model. 

The amount of models created with or without the 
low-resolution preoptimization stage. In cases of high 
confidence in the initial peptide placement (e.g. when 
only one point mutation is introduced into an existing 
structure), the user might want to avoid the larger 
range sampling of this low-resolution stage and focus 
the sampling on a closer range. When the initial place- 
ment is less confident (e.g. point mutations in the 
protein indicate a putative binding site, but the exact 
orientation of the peptide is not known), more sampl- 
ing with this low-resolution stage might increase the 
sampling range and allow the identification of the 
correct conformation. 

Reported scoring terms in the output file: the user can 
specify which specific Rosetta scoring terms will be re- 
ported for the top 10 models, e.g. Lennard-Jones full 
atom attractive and repulsive terms, the Lazaridis- 
Karplus solvation term (32), hydrogen bonding score 
(33) and others. 



Modified amino acids: currently, the server supports 
the docking of peptides containing modified amino 
acids (such as phosphorylation, acetylation, etc.) only 
in high-resolution mode (i.e. without low-resolution 
preoptimization). When submitting such complexes, 
the user should consult the FAQ page for exact format 
of the modified residue. Other non-natural amino that 
are unrecognized by the server will be ignored. 



SUMMARY 

We describe here an easy-to-use web server interface to the 
Rosetta FlexPepDock protocol for the high-resolution 
modeling of peptide-protein interactions. FlexPepDock 
has recently been used by us to successfully address several 
'real world' modeling tasks (34-37) and we expect that 
increasing its usability through this web server will open 
the door for a wide range of new systems and applications. 

We have recently extended the FlexPepDock protocol 
and introduced 'FlexPepDock ab-initio', a powerful proto- 
col for simultaneous de novo folding and docking of 
peptides at a known binding site that does not require 
an initial peptide backbone conformation. FlexPepDock 
ab initio performed well on a benchmark of peptide- 
protein interactions (38). This protocol is however 
computationally expensive and therefore not yet available 
on the web server. It can be downloaded as part of the 
next Rosetta release. 



METHODS 

Overview of the protocol 

Rosetta FlexPepDock is extensively described in Raveh 
et al. (14). We provide here a short overview of the proto- 
col. The first step in our protocol involves the 'pre- 
packing' of the input structure, to remove internal clashes: 
side chain conformations are optimized by determining 
the best rotamer combination for both the protein and 
the peptide separately. In order to create a single model, 
we conduct 10 outer cycles of optimization starting with a 
reduced repulsive van der Waals term and increased at- 
tractive van der Waals term. During refinement, the repul- 
sive and attractive terms are gradually ramped back 
towards their original values so that in the last cycle the 
energy function corresponds to the standard Rosetta 
score. Within each outer cycle, we first optimize the rigid 
body orientation between the protein and the peptide, and 
then optimize the peptide backbone for the new orienta- 
tion, both using Monte Carlo search with energy mini- 
mization. Side chain rotamers are recalculated for the 
interface on-the fly. 

Pre-optimization in low-resolution 

We provide an optional fast, low-resolution optimization 
step prior to the full atom optimization. In this step, side 
chains are represented as spherical centroids of variable 
size. Similar to the high-resolution protocol, the rigid body 
and peptide backbone degrees of freedom are optimized 
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alternately for several cycles. In this low-resolution repre- 
sentation, sampling range is usually increased. 

Rosetta infrastructure 

The server is based on Rosetta Release version 3.2 and 
implements the following command line: 

FlexPepDocking. release -database minirosetta_database 
-s start.pdb -native ref.pdb -rbMCM -torsionsMCM 
-exl -ex2aro -use_input_sc -unboundrot start.pdb 
-scorefile score. sc -ignore_unrecognized_res -nstruct 
200 [-lowres_preoptimize] 

Queue page 

The life cycle of a modeling job submitted by the user goes 
through the following stages: (i) queued: waiting to be 
processed by the server; (ii) pre-packing: processing of 
the input file and repacking of the side chains in each 
monomer (protein and peptide) to remove internal 
clashes that are not related to intermolecular interactions; 
(hi) FlexPepDocking: the actual production run — creation 
of the requested number of models by high-resolution 
refinement. This stage consumes the major part of the 
running time; (iv) processing results: creation of visual 
representation of results; (v) writing results: creation of 
the results page; (vi) completed: job has ended successfully 
or, alternatively, if job failed, the user will receive an error 
description by e-mail (and a corresponding message in the 
results page). 

Documentation 

In addition to an overview page that briefly describes 
the underlying protocol, and provides information 
about prior benchmarking results (http://flexpepdock 
.furmanlab.cs.huji.ac.il/overview.php), users can also 
read more on the Usage and Frequently Asked 
Questions (FAQ) page (http://flexpepdock.furmanlab.cs 
.huji.ac.il/usage.php), which provides details about the 
input and output of the server, as well as gives answers 
to the most common anticipated problems. Finally, results 
of a demo run are also available (http://flexpepdock 
.furmanlab.cs.huji.ac.il/demo/index.php). 

Registration 

This web site is free and open to all users and there is no 
login requirement. However, users can supply an e-mail 
address (highly recommended), which allows them to 
receive a notification via e-mail after a simulation 
finishes, as well as a convenient link to the results page. 

System architecture 

The server runs on an AMD Sun Cluster of 40 CPUs. 
Running time of a single simulation takes ~3min (de- 
pending on the peptide and receptor sizes). Each user 
submitted job is distributed on 6 CPUs and finishes 
within 1 . 5—2 h if the queue is empty. Data management 
is based on an MySQL server (v.5.1.34). 
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